22 research outputs found

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF

    QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles

    Get PDF
    Background: Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. Results: For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNVD). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNVHS). To also increase specificity, SNVs called were overruled when their frequency was below the 80th percentile calculated on the distribution of error frequencies (QQ-SNVHS-P80). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNVD performed similarly to the existing approaches. QQ-SNVHS was more sensitive on all test sets but with more false positives. QQ-SNVHS-P80 was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5 %, QQ-SNVHS-P80 revealed a sensitivity of 100 % (vs. 40-60 % for the existing methods) and a specificity of 100 % (vs. 98.0-99.7 % for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5 % were consistently detected by QQ-SNVHS-P80 from different generations of Illumina sequencers. Conclusions: We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data

    Using transcriptomics to guide lead optimization in drug discovery projects : lessons learned from the QSTAR project

    Get PDF
    The pharmaceutical industry is faced with steadily declining R&D efficiency which results in fewer drugs reaching the market despite increased investment. A major cause for this low efficiency is the failure of drug candidates in late-stage development owing to safety issues or previously undiscovered side-effects. We analyzed to what extent gene expression data can help to de-risk drug development in early phases by detecting the biological effects of compounds across disease areas, targets and scaffolds. For eight drug discovery projects within a global pharmaceutical company, gene expression data were informative and able to support go/no-go decisions. Our studies show that gene expression profiling can detect adverse effects of compounds, and is a valuable tool in early-stage drug discovery decision making

    BIGL : Biochemically Intuitive Generalized Loewe null model for prediction of the expected combined effect compatible with partial agonism and antagonism

    Get PDF
    Clinical efficacy regularly requires the combination of drugs. For an early estimation of the clinical value of (potentially many) combinations of pharmacologic compounds during discovery, the observed combination effect is typically compared to that expected under a null model. Mechanistic accuracy of that null model is not aspired to; to the contrary, combinations that deviate favorably from the model (and thereby disprove its accuracy) are prioritized. Arguably the most popular null model is the Loewe Additivity model, which conceptually maps any assay under study to a (virtual) single-step enzymatic reaction. It is easy-to-interpret and requires no other information than the concentration-response curves of the individual compounds. However, the original Loewe model cannot accommodate concentration-response curves with different maximal responses and, by consequence, combinations of an agonist with a partial or inverse agonist. We propose an extension, named Biochemically Intuitive Generalized Loewe (BIGL), that can address different maximal responses, while preserving the biochemical underpinning and interpretability of the original Loewe model. In addition, we formulate statistical tests for detecting synergy and antagonism, which allow for detecting statistically significant greater/lesser observed combined effects than expected from the null model. Finally, we demonstrate the novel method through application to several publicly available datasets

    ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering

    Get PDF
    Background: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection

    Design en synthese van potentiële bèta-turn mimetica: toepassingen in de peptidechemie en in de ontwikkeling van kleine moleculen.

    No full text
    De afgelopen jaren werd op ons laboratorium een synthese ontwikkeld voor 2(1H)pyrazinonen. Deze systemen werden vervolgens verder uitgebouwd tot amino(oxo)piperidinecarboxylaat(APC)-systemen via intermoleculaire D iels-Alderreacties. In dit werk worden nieuwe klasses APC-systemen bespr oken en wordt de eerder ontwikkelde chemie verder gevaloriseerd in de sy nthese van biologisch relevante molecules. In het eerste deel bespreken we de strategie gevolgd bij de ontwikkeling en synthese van nieuwe APC-systemen met potentieel b-turn inducerende e igenschappen. b-turns zijn belangrijke secundaire structuurelementen in peptiden en komen tussen in tal van moleculaire herkenningsprocessen. Mi metica van b-turns worden onder andere gebruikt bij de opheldering van d e receptorgebonden conformaties van peptiden. Bij de ontwikkeling van deze nieuwe systemen werd het amino(oxo)piperidi necarboxylaat als basisskelet behouden. We weten immers dat deze in de r eeds bestaande systemen verantwoordelijk is voor de b-turn inducerende e igenschappen. Er wordt echter naar extra rigiditeit gestreefd in de doel molecules. Moleculaire rigiditeit kan een positieve invloed hebben op or ale biobeschikbaarheid en op selectiviteit van binding met receptoren. De bruikbaarheid van de nieuwe APC-systemen als b-turn mimetica werd voo raf geanalyseerd via een uitgebreide moleculaire modellering. In een vol gende fase van het onderzoek werden deze verbindingen gesynthetiseerd ui tgaande van 2(1H)pyrazinonen. De functionaliseerbaarheid van deze pyr azinonen laat toe om een hele reeks natuurlijke en niet natuurlijke amin ozuren na te bootsen in de doelsystemen. De sleutelstap in de synthese i s een endo-selectieve, intramoleculaire Diels-Alderreactie. Hier word t het dipeptide-systeem conformationeel ingeperkt en wordt de cis-rel atie tussen amine en carboxylaat verzekerd. De laatste stap in de ontwik keling van de dipeptide-analoga was de methanolyse van de endo -lact amfunctie. In deze methanolyse werd in het ene geval een gebrek aan sele ctiviteit vastgesteld, in het andere een gebrek aan reactiviteit. Op dit moment in het onderzoek werd beslist om de factoren te bestuderen die d e reactiviteit en selectiviteit bepalen in de methanolysereacties van de 2,5-diazabicyclo[2.2.2]octaan-3,6-dionen verkregen via een intermolecul aire Diels-Alderreactie. Dankzij de resultaten van deze studie kon een n ieuwe syntheseweg opgesteld worden voor analoge APC-systemen waarbij nu wél selectief de exo -lactamfunctie van intramoleculaire [4+2]cycloa dducten doorgebroken wordt. (doorbraak van endo -lactamfunctie wordt verhinderd door N-alkylering of door een omvangrijke groep in de a-p ositie) Deze nieuwe strategie geeft aanleiding tot een nieuwe klasse van cis-gefuseerde APC-systemen. Via moleculaire modellering en NMR-anal yse op een model werden ook de b-turn inducerende eigenschappen van deze producten bewezen. In het tweede deel worden de b-turn inducerende eigenschappen van de APC -systemen verkregen via een intermoleculaire Diels-Alderreactie gevalide erd en worden alternatieve functionalisatiemethodes voor de APC-systemen uitgetest met het oog op de mogelijke ontwikkeling van kleine combinato riële bibliotheken. In het eerste hoofdstuk werd een methodiek op punt gesteld om de APC-sys temen te implementeren in peptiden via chemie op vaste drager (Fmoc-stra tegie). Een gefunctionaliseerd APC-systeem werd vervolgens ingebouwd in endomorfine 2 op de plaats van een mogelijke cis-peptidebinding in de bioactieve conformatie. In het tweede hoofdstuk werd getracht om zowel de N- als de C-term inus van deze systemen op alternatieve wijze uit te bouwen met het oog o p de ontwikkeling van kleine combinatoriële bibliotheken. Voor de N-t erminus bleek de vorming van een amidebinding de meest geschikte methode . Voor de C-terminus verkozen we de esterfuntie te reduceren tot een alcohol en deze verder te alkyleren. In het laatste gedeelte werd het AP C-systeem gebruikt als template voor het uitvoeren van macrocyclisati ereacties. Deze methodiek werd vervolgens toegepast op een gefunctionali seerd APC-systeem om na te gaan of deze scaffold, geïmplementeerd in een macrocyclische structuur, de minimale structuur bezit om de activiteit van het cyclische casomorfine derivaat 2 na te bootsen.status: publishe

    Principal bicorrelation analysis: unraveling associations between three data sources

    No full text
    In this article, we propose a statistical explorative method for data integration. It is developed in the context of early drug development for which it enables the detection of chemical substructures and the identification of genes that mediate their association with the bioactivity (BA). The core of the method is a sparse singular value decomposition for the identification of the gene set and a permutation-based method for the control of the false discovery rate. The method is illustrated using a real dataset, and its properties are empirically evaluated by means of a simulation study. Quantitative Structure Transcriptional Activity Relationship (QSTAR, www.qstar-consortium.org) is a new paradigm in early drug development that extends QSAR by not only considering data on the chemical structure of the compounds and on the compound-induced BA, but by simultaneously using transcriptomics data (gene expression). This approach enables, for example, the detection of chemical substructures that are associated with BA, while at the same time a gene set is correlated with both these substructures and the BA. Although causal associations cannot be formally concluded, these associations may suggest that the compounds act on the BA through a particular genomic pathway

    Automated quality control tool for high-content imaging data by building 2D prediction intervals on reference biosignatures

    No full text
    Recent advances in automated microscopy and image analysis enables quantitative profiling of cellular phenotypes (Cell Painting). It paves the way for studying the broad effects of chemical perturbations on biological systems at large scale during lead optimization. Comparison of perturbation biosignatures with biosignatures of annotated compounds can inform on both on- and off-target effects. When building databases with phenotypic profiles of thousands of compounds, it is vital to control the quality of Cell Painting assays over time. A tool for this to our knowledge does not yet exist within the imaging community. In this paper, we introduce an automated tool to assess the quality of Cell Painting assays by quantifying the reproducibility of biosignatures of annotated reference compounds. The tool learns the biosignature of those treatments from a historical dataset, and subsequently, it builds a two-dimensional probabilistic quality control (QC) limit. The limit will then be used to detect aberrations in new Cell Painting experiments. The tool is illustrated using simulated data and further demonstrated on Cell Painting data of the A549 cell line. In general, the tool provides a sensitive, detailed and easy-to-interpret mechanism to validate the quality of Cell Painting assays

    Statistical detection of synergy: New methods and a comparative study

    No full text
    Combination therapies are increasingly adopted as the standard of care for various diseases to improve treatment response, minimise the development of resistance and/or minimise adverse events. Therefore, synergistic combinations are screened early in the drug discovery process, in which their potential is evaluated by comparing the observed combination effect to that expected under a null model. Such methodology is implemented in the BIGL R-package which allows for a quick screening of drug combinations. We extend the meanR and maxR tests from this package by allowing non-constant variance of the responses and by extending the list of null models (Loewe, Loewe2, HSA, Bliss). These new tests are evaluated in a comprehensive simulation study under various models for additivity and synergy, various monotherapeutic dose–response models (complete, partial and incomplete responders) and various types of deviation from the constant variance assumption. In addition, the BIGL package is extended with bootstrap confidence intervals for the individual off-axis points and for the overall synergy strength, which were demonstrated to have reliable coverage and can complement the existing tests. We conclude that the differences in performance between the different null models are small and depend on the simulation scenario. As a result, the choice of null model should be driven by expert knowledge on the particular problem. Finally, we demonstrate the new features of the BIGL package and the difference between the synergy models on a real dataset from drug discovery. The BIGL package is available at CRAN (https://CRAN.R-project.org/package=BIGL) and as a Shiny app (https://synergy.openanalytics.eu/app)

    ViVaMBC: Estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering

    Get PDF
    Background: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection
    corecore